%/* ----------------------------------------------------------- */
%/*                                                             */
%/*                          ___                                */
%/*                       |_| | |_/   SPEECH                    */
%/*                       | | | | \   RECOGNITION               */
%/*                       =========   SOFTWARE                  */ 
%/*                                                             */
%/*                                                             */
%/* ----------------------------------------------------------- */
%/* developed at:                                               */
%/*                                                             */
%/*      Speech Vision and Robotics group                       */
%/*      Cambridge University Engineering Department            */
%/*      http://svr-www.eng.cam.ac.uk/                          */
%/*                                                             */
%/*      Entropic Cambridge Research Laboratory                 */
%/*      (now part of Microsoft)                                */
%/*                                                             */
%/* ----------------------------------------------------------- */
%/*         Copyright: Microsoft Corporation                    */
%/*          1995-2000 Redmond, Washington USA                  */
%/*                    http://www.microsoft.com                 */
%/*                                                             */
%/*               2002 Cambridge University                     */
%/*                    Engineering Department                   */
%/*                                                             */
%/*   Use of this software is governed by a License Agreement   */
%/*    ** See the file License for the Conditions of Use  **    */
%/*    **     This banner notice must not be removed      **    */
%/*                                                             */
%/* ----------------------------------------------------------- */
%


\mychap{\HTK\ Standard Lattice Format (SLF)}{htkslf}
\index{standard lattice format!definition}
\mysect{SLF Files}{slffiles}

Lattices in \HTK\ are used for storing multiple
hypotheses\index{multiple hypotheses} from the output of a speech
recogniser and for specifying finite state syntax networks for
recognition.  The \HTK\ standard lattice format (SLF) is designed to
be extensible and to be able to serve a variety of purposes.  However,
in order to facilitate the transfer of lattices\index{lattices}, it
incorporates a core set of common features.

An SLF file can contain zero or more sub-lattices\index{sub-lattices}
followed by a main lattice.  Sub-lattices are used for defining
sub-networks prior to their use in subsequent sub-lattices or the main
lattice.  They are identified by the presence of a
\texttt{SUBLAT}\index{sublat@\texttt{SUBLAT}} field and they are
terminated by a single period on a line by itself. Sub-lattices offer
a convenient way to structure finite state grammar networks. They are
never used in the output word lattices generated by a decoder. Some
lattice processing operations like lattice pruning or expansion will
destroy the sub-lattice structure, i.e.\ expand all sub-lattice
references and generate one unstructured lattice.

A lattice consists of optional header\index{lattice!header}
information followed by a sequence of node definitions and a sequence
of link (arc) definitions. Nodes and links are numbered and the first
definition line must give the total number of each.

Each link\index{lattice!link} represents a word instance occurring
between two nodes, however for more compact storage the nodes often
hold the word labels since these are frequently common to all words
entering a node (the node effectively represents the end of several
word instances). This is also used in lattices representing word-level
networks where each node is a word end, and each arc is a word
transition.

Each node\index{lattice!node} may optionally be labelled with a word
hypothesis and with a time. Each link has a start and end node number
and may optionally be labelled with a word hypothesis (including the
pronunciation variant, acoustic score and segmentation of the word
hypothesis) and a language model score.

The lattice must have exactly one start node (no incoming arcs) and
one end node (no outgoing arcs). The special word identifier
\verb|!NULL| can be used for the start and end node if necessary.

\mysect{Format}{slfformat}

The format\index{lattice!format} is designed to allow optional
information that at its most detailed gives full identity, alignment
and score (log likelihood) information at the word and phone level to
allow calculation of the alignment and likelihood of an individual
hypothesis. However, without scores or times the lattice is just a
word graph. The format is designed to be extensible.  Further field
names can be defined to allow arbitrary information to be added to the
lattice without making the resulting file unreadable by others.

The lattices are stored in a text file as a series of fields that form
two blocks:

\begin{itemize}
\item   A header, specifying general information about the lattice.
\item   The node and link definitions.
\end{itemize}

Either block may contain comment lines\index{lattice!comment lines},
for which the first character is a `\#' and the rest of the line is
ignored.

All non-comment lines consist of fields, separated by white space.
Fields consist of an alphanumeric field name, followed by a delimiter
(the character `=' or `\verb|~|') and a (possibly ``quoted'') field
value.  Single character field names are reserved for fields defined
in the specification and single character abbreviations may be used
for many of the fields defined below. Field values can be specified
either as normal text (e.g.\ \verb|a=-318.31|) or in a binary
representation if the `=' character is replaced by `\verb|~|'. The
binary representation consists of a 4-byte floating point number (IEEE
754) or a 4-byte integer number stored in big-endian byte order by
default (see section~\ref{s:byteswap} for a discussion of different
byte-orders in HTK).

The convention used to define the current field
names\index{lattice!field names} is that lower case is used for
optional fields and upper case is used for required fields. The
meaning of field names can be dependent on the context in which they
appear.

The header must include a field specifying which utterance was used to
generate the lattice and a field specifying the version of the lattice
specification used.  The header is terminated by a line which defines
the number of nodes and links in the lattice.

The node definitions are optional but if included each node definition
consists of a single line which specifies the node number followed by
optional fields that may (for instance) define the time of the node or
the word hypothesis ending at that node.

The link definitions are required and each link definition consists of
a single line which specifies the link number as well as the start and
end node numbers that it connects to and optionally other information
about the link such as the word identity and language model score. If
word identity information is not present in node definitions then it
must appear in link definitions.

\mysect{Syntax}{slfsyntax}

The following rules define the syntax\index{lattice!syntax} of an SLF
lattice. Any unrecognised fields will be ignored and no user defined
fields may share the first character with pre-defined field names. The
syntax specification below employs the modified BNF notation used in
section~\ref{s:hmmdef}. For the node and arc field names only the
abbreviated names are given and only the text format is documented in
the syntax.


\begin{verbatim}
latticedef = laticehead
             lattice { lattice }


latticehead = "VERSION=" number 
              "UTTERANCE=" string
              "SUBLAT=" string
              { "vocab=" string | "hmms=" string | "lmname=" string |
                "wdpenalty=" floatnumber | "lmscale=" floatnumber |
                "acscale=" floatnumber | "base=" floatnumber | "tscale=" floatnumber }

lattice = sizespec
          { node }
          { arc }

sizespec = "N=" intnumber "L=" intnumber

node = "I=" intnumber 
       { "t=" floatnumber | "W=" string  | 
         "s=" string | "L=" string | "v=" intnumber } 

arc = "J=" intnumber 
      "S=" intnumber 
      "E=" intnumber 
      { "a=" floatnumber  | "l=" floatnumber  | "a=" floatnumber  | "r=" floatnumber  |
        "W=" string | "v=" intnumber | "d=" segments }

segments =  ":" segment {segment}
segment =  string [ "," floatnumber [ "," floatnumber ]] ":"
\end{verbatim}


\mysect{Field Types}{slffields}

The currently defined fields are as follows:-

\begin{verbatim}
  Field        abbr o|c Description

Header fields
  VERSION=%s     V  o  Lattice specification adhered to
  UTTERANCE=%s   U  o  Utterance identifier
  SUBLAT=%s      S  o  Sub-lattice name
  acscale=%f        o  Scaling factor for acoustic likelihoods
  tscale=%f         o  Scaling factor for times (default 1.0, i.e.\ seconds)
  base=%f           o  LogBase for Likelihoods (0.0 not logs, default base e)
  lmname=%s         o  Name of Language model
  lmscale=%f        o  Scaling factor for language model
  wdpenalty=%f      o  Word insertion penalty

Lattice Size fields
  NODES=%d       N  c  Number of nodes in lattice
  LINKS=%d       L  c  Number of links in lattice

Node Fields
  I=%d                 Node identifier.  Starts node information
  time=%f        t  o  Time from start of utterance (in seconds)
  WORD=%s        W wc  Word (If lattice labels nodes rather that links)
  L=%s             wc  Substitute named sub-lattice for this node
  var=%d         v wo  Pronunciation variant number
  s=%s           s  o  Semantic Tag

Link Fields
  J=%d                 Link identifier.  Starts link information
  START=%d       S  c  Start node number (of the link)
  END=%d         E  c  End node number (of the link)
  WORD=%s        W wc  Word (If lattice labels links rather that nodes)
  var=%d         v wo  Pronunciation variant number
  div=%s         d wo  Segmentation (modelname, duration, likelihood) triples
  acoustic=%f    a wo  Acoustic likelihood of link
  language=%f    l  o  General language model likelihood of link
  r=%f           r  o  Pronunciation probability

Note: The word identity (and associated `w' fields var,div and acoustic) must
      appear on either the link or the end node.

      abbr is a possible single character abbreviation for the field name
      o|c indicates whether field is optional or compulsory.
\end{verbatim}

%   ngram=%f       n  o  NGram likelihood of link

\mysect{Example SLF file}{slfeg}

The following is a real lattice (generated by the \HTK\ Switchboard
Large Vocabulary System with a 54k dictionary and a word fourgram LM)
with word labels occurring on the end nodes of the links.

Note that the \verb|!SENT_SENT| and \verb|!SENT_END| ``words'' model
initial and final silence.

\begin{verbatim}
VERSION=1.0
UTTERANCE=s22-0017-A_0017Af-s22_000070_000157.plp
lmname=/home/solveb/hub5/lib/lang/fgintcat_54khub500.txt
lmscale=12.00  wdpenalty=-10.00
vocab=/home/solveb/hub5/lib/dicts/54khub500v3.lvx.dct
N=32   L=45   
I=0    t=0.00  W=!NULL               
I=1    t=0.05  W=!SENT_START         v=1  
I=2    t=0.05  W=!SENT_START         v=1  
I=3    t=0.15  W=!SENT_START         v=1  
I=4    t=0.15  W=!SENT_START         v=1  
I=5    t=0.19  W=HOW                 v=1  
I=6    t=0.29  W=UM                  v=1  
I=7    t=0.29  W=M                   v=1  
I=8    t=0.29  W=HUM                 v=1  
I=9    t=0.70  W=OH                  v=1  
I=10   t=0.70  W=O                   v=1  
I=11   t=0.70  W=KOMO                v=1  
I=12   t=0.70  W=COMO                v=1  
I=13   t=0.70  W=CUOMO               v=1  
I=14   t=0.70  W=HELLO               v=1  
I=15   t=0.70  W=OH                  v=1  
I=16   t=0.70  W=LOW                 v=1  
I=17   t=0.71  W=HELLO               v=1  
I=18   t=0.72  W=HELLO               v=1  
I=19   t=0.72  W=HELLO               v=1  
I=20   t=0.72  W=HELLO               v=1  
I=21   t=0.73  W=CUOMO               v=1  
I=22   t=0.73  W=HELLO               v=1  
I=23   t=0.77  W=I                   v=1  
I=24   t=0.78  W=I'M                 v=1  
I=25   t=0.78  W=TO                  v=1  
I=26   t=0.78  W=AND                 v=1  
I=27   t=0.78  W=THERE               v=1  
I=28   t=0.79  W=YEAH                v=1  
I=29   t=0.80  W=IS                  v=1  
I=30   t=0.88  W=!SENT_END           v=1  
I=31   t=0.88  W=!NULL               
J=0     S=0    E=1    a=-318.31   l=0.000   
J=1     S=0    E=2    a=-318.31   l=0.000   
J=2     S=0    E=3    a=-1094.09  l=0.000   
J=3     S=0    E=4    a=-1094.09  l=0.000   
J=4     S=2    E=5    a=-1063.12  l=-5.496  
J=5     S=3    E=6    a=-1112.78  l=-4.395  
J=6     S=4    E=7    a=-1086.84  l=-9.363  
J=7     S=2    E=8    a=-1876.61  l=-7.896  
J=8     S=6    E=9    a=-2673.27  l=-5.586  
J=9     S=7    E=10   a=-2673.27  l=-2.936  
J=10    S=1    E=11   a=-4497.15  l=-17.078 
J=11    S=1    E=12   a=-4497.15  l=-15.043 
J=12    S=1    E=13   a=-4497.15  l=-12.415 
J=13    S=2    E=14   a=-4521.94  l=-7.289  
J=14    S=8    E=15   a=-2673.27  l=-3.422  
J=15    S=5    E=16   a=-3450.71  l=-8.403  
J=16    S=2    E=17   a=-4635.08  l=-7.289  
J=17    S=2    E=18   a=-4724.45  l=-7.289  
J=18    S=2    E=19   a=-4724.45  l=-7.289  
J=19    S=2    E=20   a=-4724.45  l=-7.289  
J=20    S=1    E=21   a=-4796.74  l=-12.415 
J=21    S=2    E=22   a=-4821.53  l=-7.289  
J=22    S=18   E=23   a=-435.64   l=-4.488  
J=23    S=18   E=24   a=-524.33   l=-3.793  
J=24    S=19   E=25   a=-520.16   l=-4.378  
J=25    S=20   E=26   a=-521.50   l=-3.435  
J=26    S=17   E=27   a=-615.12   l=-4.914  
J=27    S=22   E=28   a=-514.04   l=-5.352  
J=28    S=21   E=29   a=-559.43   l=-1.876  
J=29    S=9    E=30   a=-1394.44  l=-2.261  
J=30    S=10   E=30   a=-1394.44  l=-1.687  
J=31    S=11   E=30   a=-1394.44  l=-2.563  
J=32    S=12   E=30   a=-1394.44  l=-2.352  
J=33    S=13   E=30   a=-1394.44  l=-3.285  
J=34    S=14   E=30   a=-1394.44  l=-0.436  
J=35    S=15   E=30   a=-1394.44  l=-2.069  
J=36    S=16   E=30   a=-1394.44  l=-2.391  
J=37    S=23   E=30   a=-767.55   l=-4.081  
J=38    S=24   E=30   a=-692.95   l=-3.868  
J=39    S=25   E=30   a=-692.95   l=-2.553  
J=40    S=26   E=30   a=-692.95   l=-3.294  
J=41    S=27   E=30   a=-692.95   l=-0.855  
J=42    S=28   E=30   a=-623.50   l=-0.762  
J=43    S=29   E=30   a=-556.71   l=-3.019  
J=44    S=30   E=31   a=0.00      l=0.000   
\end{verbatim}


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "htkbook"
%%% End: 
